Eeect of Virtual Channels and Memory Organization on Cache-coherent Shared-memory Multiprocessors
نویسندگان
چکیده
In this paper, performance of wormhole routed 2-D torus network with virtual channels has been evaluated for cache-coherent shared-memory multiprocessors with execution-driven simulation using various applications. The traac in such systems is very diierent from the traac in message-passing environment and is characterized by traac bursts, one-to-many and many-to-one traac, and small xed length messages. We show the impact of various network parameters, such as number of virtual channels, number of it buuers per virtual channel, and number of internal links. We have also considered low-order and high-order interleaving of memory blocks on nodes to show its impact on the network performance. The study shows that 4 virtual channels per link is most eecient for 2-D torus networks. The number of it buuers per virtual channel also has a considerable impact and 2 to 4 it buuers are usually enough. The number of internal links also has an impact on the performance for applications, such as MP3D, that generate large contention for shared variables. Larger number of internal links are also useful in case of high-order interleaved memory to reduce hot-spots at the communication interface of favorite nodes.
منابع مشابه
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors
In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole routing and cut-through switching are evaluated for these shared-memory multiprocessors that employ multistage interconnection network (MIN) and full map directory-based cache coherence protocol. The switch design also con...
متن کاملScheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors 1 Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors
Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the eeect of task schedu...
متن کاملExecution Based Evaluation of Multistage Interconnection Networks for Cache-Coherent Multiprocessors
In this paper, performance of multistage interconnection network with wormhole routing and packet switching has been evaluated for cache-coherent shared-memory multiprocessors. The traac in cache-coherent systems is characterized by traac bursts, one-to-many and many-to-one traac, and small xed length messages. The evaluation is based on execution-driven simulation using various applications. T...
متن کاملScheduling to Reduce Memory Coherence Overhead on Coarse-Grain Multiprocessors
Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the eeect of task schedu...
متن کاملSoftware Caching on Cache-Coherent Multiprocessors
Programmers have always been concerned with data distribution and remote memory access costs on shared-memory multiprocessors that lack coherent caches, like the BBN Butterry. Recently memory latency has become an important issue on cache-coherent multiprocessors, where dramatic improvements in microprocessor performance have increased the relative cost of cache misses and coherency transaction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996